{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "f4c59543-f758-4210-aae1-8ffa175f3e3d",
   "metadata": {},
   "source": [
    "[![Kaggle](https://kaggle.com/static/images/open-in-kaggle.svg)](https://kaggle.com/kernels/welcome?src=https://github.com/pixeltable/pixeltable/blob/master/docs/release/tutorials/rag-operations.ipynb)&nbsp;&nbsp;\n",
    "[![Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/pixeltable/pixeltable/blob/master/docs/release/tutorials/rag-operations.ipynb)\n",
    "\n",
    "# RAG Operations in Pixeltable\n",
    "\n",
    "In this tutorial, we'll explore Pixeltable's flexible handling of RAG operations on unstructured text. In a traditional AI workflow, such operations might be implemented as a Python script that runs on a periodic schedule or in response to certain events. In Pixeltable, as with everything else, they are implemented as persistent table operations that update incrementally as new data becomes available. In our tutorial workflow, we'll chunk Wikipedia articles in various ways with a document splitter, then apply several kinds of embeddings to the chunks.\n",
    "\n",
    "## Set Up the Table Structure\n",
    "\n",
    "We start by installing the necessary dependencies, creating a Pixeltable directory `rag_ops_demo` (if it doesn't already exist), and setting up the table structure for our new workflow."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "f54bd405-7f63-46c3-8892-5ffbcccd43d3",
   "metadata": {},
   "outputs": [],
   "source": [
    "%pip install -q pixeltable sentence-transformers"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "id": "a6e5554a-440e-4573-be78-624d012948de",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Connected to Pixeltable database at: postgresql://postgres:@/pixeltable?host=/Users/asiegel/.pixeltable/pgdata\n",
      "Created directory `rag_ops_demo`.\n"
     ]
    }
   ],
   "source": [
    "import pixeltable as pxt\n",
    "\n",
    "# Create the Pixeltable workspace\n",
    "pxt.drop_dir('rag_ops_demo', force=True)  # Ensure a clean slate for the demo\n",
    "pxt.create_dir('rag_ops_demo')"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "90639ba7",
   "metadata": {},
   "source": [
    "## Creating Tables and Views\n",
    "\n",
    "Now we'll create the tables that represent our workflow, starting with a table to hold references to source documents. The table contains a single column `source_doc` whose elements have type `pxt.DocumentType`, representing a general document instance. In this tutorial, we'll be working with HTML documents, but Pixeltable supports a range of other document types, such as Markdown and PDF."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "id": "c8371827",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Created table `docs`.\n"
     ]
    }
   ],
   "source": [
    "docs = pxt.create_table('rag_ops_demo.docs', {'source_doc': pxt.DocumentType()})"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "275d07fd-1f16-47b4-b42a-39c613c3bf5c",
   "metadata": {},
   "source": [
    "If we take a peek at the `docs` table, we see its very simple structure."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "id": "d0644d63-991f-4bf4-82d3-7f887eadadbb",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<style type=\"text/css\">\n",
       "#T_54e9b th {\n",
       "  text-align: center;\n",
       "}\n",
       "#T_54e9b_row0_col0, #T_54e9b_row0_col1, #T_54e9b_row0_col2 {\n",
       "  white-space: pre-wrap;\n",
       "  text-align: left;\n",
       "}\n",
       "</style>\n",
       "<table id=\"T_54e9b\">\n",
       "  <thead>\n",
       "    <tr>\n",
       "      <th id=\"T_54e9b_level0_col0\" class=\"col_heading level0 col0\" >Column Name</th>\n",
       "      <th id=\"T_54e9b_level0_col1\" class=\"col_heading level0 col1\" >Type</th>\n",
       "      <th id=\"T_54e9b_level0_col2\" class=\"col_heading level0 col2\" >Computed With</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <td id=\"T_54e9b_row0_col0\" class=\"data row0 col0\" >source_doc</td>\n",
       "      <td id=\"T_54e9b_row0_col1\" class=\"data row0 col1\" >document</td>\n",
       "      <td id=\"T_54e9b_row0_col2\" class=\"data row0 col2\" ></td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n"
      ],
      "text/plain": [
       "table 'docs'\n",
       "\n",
       "Column Name     Type Computed With\n",
       " source_doc document              "
      ]
     },
     "execution_count": 3,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "docs"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "4e1a701e-5f9d-4f56-a306-c31ff9da6ee7",
   "metadata": {},
   "source": [
    "Next we create a view to represent chunks of our HTML documents. A Pixeltable view is a virtual table, which is dynamically derived from a source table by applying a transformation and/or selecting a subset of data. In this case, our view represents a one-to-many transformation from source documents into individual sentences. This is achieved using Pixeltable's built-in `DocumentSplitter` class.\n",
    "\n",
    "Note that the `docs` table is currently empty, so creating this view doesn't actually *do* anything yet: it simply defines an operation that we want Pixeltable to execute when it sees new data."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "id": "d074b305",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Created view `sentences` with 0 rows, 0 exceptions.\n"
     ]
    }
   ],
   "source": [
    "from pixeltable.iterators.document import DocumentSplitter\n",
    "\n",
    "sentences = pxt.create_view(\n",
    "    'rag_ops_demo.sentences',  # Name of the view\n",
    "    docs,  # Table from which the view is derived\n",
    "    iterator=DocumentSplitter.create(\n",
    "        document=docs.source_doc,\n",
    "        separators='sentence',  # Chunk docs into sentences\n",
    "        metadata='title,heading,sourceline'\n",
    "    )\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "9badf052-7e8d-43b2-9aea-0b20a9409a83",
   "metadata": {},
   "source": [
    "Let's take a peek at the new `sentences` view."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "id": "29b07709",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<style type=\"text/css\">\n",
       "#T_65f33 th {\n",
       "  text-align: center;\n",
       "}\n",
       "#T_65f33_row0_col0, #T_65f33_row0_col1, #T_65f33_row0_col2, #T_65f33_row1_col0, #T_65f33_row1_col1, #T_65f33_row1_col2, #T_65f33_row2_col0, #T_65f33_row2_col1, #T_65f33_row2_col2, #T_65f33_row3_col0, #T_65f33_row3_col1, #T_65f33_row3_col2, #T_65f33_row4_col0, #T_65f33_row4_col1, #T_65f33_row4_col2, #T_65f33_row5_col0, #T_65f33_row5_col1, #T_65f33_row5_col2 {\n",
       "  white-space: pre-wrap;\n",
       "  text-align: left;\n",
       "}\n",
       "</style>\n",
       "<table id=\"T_65f33\">\n",
       "  <thead>\n",
       "    <tr>\n",
       "      <th id=\"T_65f33_level0_col0\" class=\"col_heading level0 col0\" >Column Name</th>\n",
       "      <th id=\"T_65f33_level0_col1\" class=\"col_heading level0 col1\" >Type</th>\n",
       "      <th id=\"T_65f33_level0_col2\" class=\"col_heading level0 col2\" >Computed With</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <td id=\"T_65f33_row0_col0\" class=\"data row0 col0\" >pos</td>\n",
       "      <td id=\"T_65f33_row0_col1\" class=\"data row0 col1\" >int</td>\n",
       "      <td id=\"T_65f33_row0_col2\" class=\"data row0 col2\" ></td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td id=\"T_65f33_row1_col0\" class=\"data row1 col0\" >text</td>\n",
       "      <td id=\"T_65f33_row1_col1\" class=\"data row1 col1\" >string</td>\n",
       "      <td id=\"T_65f33_row1_col2\" class=\"data row1 col2\" ></td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td id=\"T_65f33_row2_col0\" class=\"data row2 col0\" >title</td>\n",
       "      <td id=\"T_65f33_row2_col1\" class=\"data row2 col1\" >string</td>\n",
       "      <td id=\"T_65f33_row2_col2\" class=\"data row2 col2\" ></td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td id=\"T_65f33_row3_col0\" class=\"data row3 col0\" >heading</td>\n",
       "      <td id=\"T_65f33_row3_col1\" class=\"data row3 col1\" >json</td>\n",
       "      <td id=\"T_65f33_row3_col2\" class=\"data row3 col2\" ></td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td id=\"T_65f33_row4_col0\" class=\"data row4 col0\" >sourceline</td>\n",
       "      <td id=\"T_65f33_row4_col1\" class=\"data row4 col1\" >int</td>\n",
       "      <td id=\"T_65f33_row4_col2\" class=\"data row4 col2\" ></td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td id=\"T_65f33_row5_col0\" class=\"data row5 col0\" >source_doc</td>\n",
       "      <td id=\"T_65f33_row5_col1\" class=\"data row5 col1\" >document</td>\n",
       "      <td id=\"T_65f33_row5_col2\" class=\"data row5 col2\" ></td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n"
      ],
      "text/plain": [
       "view 'sentences'\n",
       "\n",
       "Column Name     Type Computed With\n",
       "        pos      int              \n",
       "       text   string              \n",
       "      title   string              \n",
       "    heading     json              \n",
       " sourceline      int              \n",
       " source_doc document              "
      ]
     },
     "execution_count": 5,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "sentences"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "9641a8a0-a7fe-46b8-a2e8-3589e0df2bae",
   "metadata": {},
   "source": [
    "We see that `sentences` inherits the `source_doc` column from `docs`, together with some new fields:\n",
    "- `pos`: The position in the source document where the sentence appears.\n",
    "-  `text`: The text of the sentence.\n",
    "- `title`, `heading`, and `sourceline`: The metadata we requested when we set up the view."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "e53392cb",
   "metadata": {},
   "source": [
    "## Data Ingestion"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "0aec54a3-f5a8-483e-b3b9-c9111929ec74",
   "metadata": {},
   "source": [
    "Ok, now it's time to insert some data into our workflow. A document in Pixeltable is just a URL; the following command inserts a single row into the `docs` table with the `source_doc` field set to the specified URL:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "id": "a718a299",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Inserting rows into `docs`: 1 rows [00:00, 1013.12 rows/s]\n",
      "Inserting rows into `sentences`: 1460 rows [00:00, 3129.84 rows/s]\n",
      "Inserted 1461 rows with 0 errors.\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "UpdateStatus(num_rows=1461, num_computed_values=0, num_excs=0, updated_cols=[], cols_with_excs=[])"
      ]
     },
     "execution_count": 6,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "docs.insert(source_doc='https://en.wikipedia.org/wiki/Marc_Chagall')"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "542f97cd-08b9-4001-9d38-10f47bfdd5d0",
   "metadata": {},
   "source": [
    "We can see that two things happened. First, a single row was inserted into `docs`, containing the URL representing our source document. Then, the view `sentences` was incrementally updated by applying the `DocumentSplitter` according to the definition of the view. This illustrates an important principle in Pixeltable: by default, anytime Pixeltable sees new data, the update is incrementally propagated to any downstream views or computed columns.\n",
    "\n",
    "We can see the effect of the insertion with the `select` command. There's a single row in `docs`:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "id": "0bbe8e86-3cec-499f-a30f-58c5073ab42b",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th>source_doc_fileurl</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <td>https://en.wikipedia.org/wiki/Marc_Chagall</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>"
      ],
      "text/plain": [
       "                           source_doc_fileurl\n",
       "0  https://en.wikipedia.org/wiki/Marc_Chagall"
      ]
     },
     "execution_count": 7,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "docs.select(docs.source_doc.fileurl).show()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "a2dd8f07-6f5c-426b-a829-51dd8e26f888",
   "metadata": {},
   "source": [
    "And here are the first 20 rows in `sentences`. The content of the article is broken into individual sentences, as expected."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "id": "42199e16-559f-4827-940c-1726f83452e5",
   "metadata": {
    "scrolled": true
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th>text</th>\n",
       "      <th>heading</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <td>Marc Chagall - Wikipedia Jump to content Search Search</td>\n",
       "      <td>{}</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>Marc Chagall 81 languages Afrikaans Alemannisch العربية</td>\n",
       "      <td>{&quot;1&quot;: &quot;Marc Chagall&quot;}</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>Aragonés Արեւմտահայերէն Asturianu Azərbaycanca বাংলা Башҡортса Беларуская Беларуская (тарашкевіца) Български Català Čeština Cymraeg Dansk Deutsch Eesti Ελληνικά Español Esperanto Euskara فارسی Français Galego 한국어 Հայերեն हिन्दी</td>\n",
       "      <td>{&quot;1&quot;: &quot;Marc Chagall&quot;}</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>Hrvatski Ido Bahasa Indonesia Interlingua Italiano עברית Jawa ქართული Kiswahili Latina Latviešu Lëtzebuergesch Lietuvių Magyar Македонски Malagasy مصرى Nederlands Nedersaksies 日本語 Norsk bokmål Norsk nynorsk Occitan Oʻzbekcha / ўзбекча پنجابی Picard Piemontèis Plattdüütsch Polski Português Română Runa Simi Русский Scots Shqip Sicilianu Simple English Slovenčina Slovenščina کوردی Српски / srpski Srpskohrvatski / српскохрватски Suomi Svenska ไทย</td>\n",
       "      <td>{&quot;1&quot;: &quot;Marc Chagall&quot;}</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>Türkçe Українська Tiếng Việt Winaray 吴语 ייִדיש</td>\n",
       "      <td>{&quot;1&quot;: &quot;Marc Chagall&quot;}</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>粵語 中文 Edit links From Wikipedia, the free encyclopedia Russian-French artist (1887–1985) &quot;Chagall&quot; redirects here.</td>\n",
       "      <td>{&quot;1&quot;: &quot;Marc Chagall&quot;}</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>For other uses, see Chagall (disambiguation) .</td>\n",
       "      <td>{&quot;1&quot;: &quot;Marc Chagall&quot;}</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>Marc Chagall Chagall, c. 1920</td>\n",
       "      <td>{&quot;1&quot;: &quot;Marc Chagall&quot;}</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>Born Moishe Shagal ( 1887-07-06 ) 6 July 1887 (N.S.) Liozna , Vitebsk Governorate , Russian Empire (now Belarus)</td>\n",
       "      <td>{&quot;1&quot;: &quot;Marc Chagall&quot;}</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>[1] Died 28 March 1985 (1985-03-28) (aged 97)</td>\n",
       "      <td>{&quot;1&quot;: &quot;Marc Chagall&quot;}</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>Saint-Paul-de-Vence , France Nationality Russian Empire,</td>\n",
       "      <td>{&quot;1&quot;: &quot;Marc Chagall&quot;}</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>later French</td>\n",
       "      <td>{&quot;1&quot;: &quot;Marc Chagall&quot;}</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>[2] Known for Painting stained glass Notable work See list of artworks by Marc Chagall Movement Cubism Expressionism School of Paris Spouses Bella Rosenfeld ​ ​ ( m. 1915; died 1944)</td>\n",
       "      <td>{&quot;1&quot;: &quot;Marc Chagall&quot;}</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>​ Valentina (Vava) Brodsky ​ ​ ( m. 1952)</td>\n",
       "      <td>{&quot;1&quot;: &quot;Marc Chagall&quot;}</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>​</td>\n",
       "      <td>{&quot;1&quot;: &quot;Marc Chagall&quot;}</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>[3] Children 2</td>\n",
       "      <td>{&quot;1&quot;: &quot;Marc Chagall&quot;}</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>[4]</td>\n",
       "      <td>{&quot;1&quot;: &quot;Marc Chagall&quot;}</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>Marc Chagall</td>\n",
       "      <td>{&quot;1&quot;: &quot;Marc Chagall&quot;}</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>[a] (born Moishe Shagal ; 6 July [ O.S. 24 June] 1887 – 28 March 1985) was a Belarusian-French artist.</td>\n",
       "      <td>{&quot;1&quot;: &quot;Marc Chagall&quot;}</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>[b] An early modernist , he was associated with the École de Paris as well as several major artistic styles and created works in a wide range of artistic formats, including painting, drawings, book illustrations, stained glass , stage sets, ceramics, tapestries and fine art prints.</td>\n",
       "      <td>{&quot;1&quot;: &quot;Marc Chagall&quot;}</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>"
      ],
      "text/plain": [
       "                                                 text                heading\n",
       "0   Marc Chagall - Wikipedia Jump to content Searc...                     {}\n",
       "1   Marc Chagall 81 languages Afrikaans Alemannisc...  {'1': 'Marc Chagall'}\n",
       "2   Aragonés Արեւմտահայերէն Asturianu Azərbaycanca...  {'1': 'Marc Chagall'}\n",
       "3   Hrvatski Ido Bahasa Indonesia Interlingua Ital...  {'1': 'Marc Chagall'}\n",
       "4      Türkçe Українська Tiếng Việt Winaray 吴语 ייִדיש  {'1': 'Marc Chagall'}\n",
       "5   粵語 中文 Edit links From Wikipedia, the free ency...  {'1': 'Marc Chagall'}\n",
       "6      For other uses, see Chagall (disambiguation) .  {'1': 'Marc Chagall'}\n",
       "7                       Marc Chagall Chagall, c. 1920  {'1': 'Marc Chagall'}\n",
       "8   Born Moishe Shagal ( 1887-07-06 ) 6 July 1887 ...  {'1': 'Marc Chagall'}\n",
       "9       [1] Died 28 March 1985 (1985-03-28) (aged 97)  {'1': 'Marc Chagall'}\n",
       "10  Saint-Paul-de-Vence , France Nationality Russi...  {'1': 'Marc Chagall'}\n",
       "11                                       later French  {'1': 'Marc Chagall'}\n",
       "12  [2] Known for Painting stained glass Notable w...  {'1': 'Marc Chagall'}\n",
       "13          ​ Valentina (Vava) Brodsky ​ ​ ( m. 1952)  {'1': 'Marc Chagall'}\n",
       "14                                                  ​  {'1': 'Marc Chagall'}\n",
       "15                                     [3] Children 2  {'1': 'Marc Chagall'}\n",
       "16                                                [4]  {'1': 'Marc Chagall'}\n",
       "17                                       Marc Chagall  {'1': 'Marc Chagall'}\n",
       "18  [a] (born Moishe Shagal ; 6 July [ O.S. 24 Jun...  {'1': 'Marc Chagall'}\n",
       "19  [b] An early modernist , he was associated wit...  {'1': 'Marc Chagall'}"
      ]
     },
     "execution_count": 8,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "sentences.select(sentences.text, sentences.heading).show(20)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "9e3abbdd",
   "metadata": {},
   "source": [
    "## Experimenting with Chunking"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "2d229545-79b6-4fdd-be54-f786669533eb",
   "metadata": {},
   "source": [
    "Of course, chunking into sentences isn't the only way to split a document. Perhaps we want to experiment with different chunking methodologies, in order to see which one performs best in a particular application. Pixeltable makes it easy to do this, by creating several views of the same source table. Here are a few examples. Notice that as each new view is created, it is initially populated from the data already in `docs`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "id": "887389a1",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Inserting rows into `chunks`: 205 rows [00:00, 12033.37 rows/s]\n",
      "Created view `chunks` with 205 rows, 0 exceptions.\n"
     ]
    }
   ],
   "source": [
    "chunks = pxt.create_view(\n",
    "    'rag_ops_demo.chunks', docs,\n",
    "    iterator=DocumentSplitter.create(\n",
    "        document=docs.source_doc,\n",
    "        separators='paragraph,token_limit',\n",
    "        limit=2048,\n",
    "        overlap=0,\n",
    "        metadata='title,heading,sourceline'\n",
    "    )\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "id": "fd4c8b8b",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Inserting rows into `short_chunks`: 531 rows [00:00, 13029.98 rows/s]\n",
      "Created view `short_chunks` with 531 rows, 0 exceptions.\n"
     ]
    }
   ],
   "source": [
    "short_chunks = pxt.create_view(\n",
    "    'rag_ops_demo.short_chunks', docs,\n",
    "    iterator=DocumentSplitter.create(\n",
    "        document=docs.source_doc,\n",
    "        separators='paragraph,token_limit',\n",
    "        limit=72,\n",
    "        overlap=0,\n",
    "        metadata='title,heading,sourceline'\n",
    "    )\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "id": "096df773",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Inserting rows into `short_char_chunks`: 1764 rows [00:00, 11451.19 rows/s]\n",
      "Created view `short_char_chunks` with 1764 rows, 0 exceptions.\n"
     ]
    }
   ],
   "source": [
    "short_char_chunks = pxt.create_view(\n",
    "    'rag_ops_demo.short_char_chunks', docs,\n",
    "    iterator=DocumentSplitter.create(\n",
    "        document=docs.source_doc,\n",
    "        separators='paragraph,char_limit',\n",
    "        limit=72,\n",
    "        overlap=0,\n",
    "        metadata='title,heading,sourceline'\n",
    "    )\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "id": "d8289f0a",
   "metadata": {
    "scrolled": true
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th>text</th>\n",
       "      <th>heading</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <td>Marc Chagall - Wikipedia Jump to content Search Search</td>\n",
       "      <td>{}</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>Marc Chagall 81 languages Afrikaans Alemannisch العربية Aragonés Արեւմտահայերէն Asturianu Azərbaycanca বাংলা Башҡортса Беларуская Беларуская (тарашкевіца) Български Català Čeština Cymraeg Dansk Deutsch Eesti Ελληνικά Español Esperanto Euskara فارسی Français Galego 한국어 Հայերեն हिन्दी Hrvatski Ido Bahasa Indonesia Interlingua Italiano עברית Jawa ქართული Kiswahili Latina Latviešu Lëtzebuergesch Lietuvių Magyar Македонски Malagasy مصرى Nederlands Nedersaksies 日本語 Norsk bokmål Norsk nynorsk Occitan Oʻzbekcha / ўзбекча پنجابی Picard Piemontèis Plattdüütsch Polski Português Română Runa Simi Русский Scots Shqip Sicilianu Simple English Slovenčina Slovenščina کوردی Српски / srpski Srpskohrvatski / српскохрватски Suomi Svenska ไทย Türkçe Українська Tiếng Việt Winaray 吴语 ייִדיש 粵語 中文 Edit links From Wikipedia, the free encyclopedia Russian-French artist (1887–1985) &quot;Chagall&quot; redirects here. For other uses, see Chagall (disambiguation) .</td>\n",
       "      <td>{&quot;1&quot;: &quot;Marc Chagall&quot;}</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>Marc Chagall Chagall, c. 1920 Born Moishe Shagal ( 1887-07-06 ) 6 July 1887 (N.S.) Liozna , Vitebsk Governorate , Russian Empire (now Belarus) [1] Died 28 March 1985 (1985-03-28) (aged 97) Saint-Paul-de-Vence , France Nationality Russian Empire, later French [2] Known for Painting stained glass Notable work See list of artworks by Marc Chagall Movement Cubism Expressionism School of Paris Spouses Bella Rosenfeld ​ ​ ( m. 1915; died 1944) ​ Valentina (Vava) Brodsky ​ ​ ( m. 1952) ​ [3] Children 2 [4]</td>\n",
       "      <td>{&quot;1&quot;: &quot;Marc Chagall&quot;}</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>Marc Chagall [a] (born Moishe Shagal ; 6 July [ O.S. 24 June] 1887 – 28 March 1985) was a Belarusian-French artist. [b] An early modernist , he was associated with the École de Paris as well as several major artistic styles and created works in a wide range of artistic formats, including painting, drawings, book illustrations, stained glass , stage sets, ceramics, tapestries and fine art prints.</td>\n",
       "      <td>{&quot;1&quot;: &quot;Marc Chagall&quot;}</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>Chagall was born in 1887, into a Jewish family near Vitebsk , today in Belarus , but at that time in the Pale of Settlement of the Russian Empire. Before World War I , he travelled between Saint Petersburg , Paris , and Berlin . During that period, he created his own mixture and style of modern art, based on his ideas of Eastern European and Jewish folklore. He spent the wartime years in his native Belarus, becoming one of the country&#x27;s most distinguished artists and a member of the modernist avant-garde , founding the Vitebsk Arts College . He later worked in and near Moscow in difficult conditions during hard times in Russia following the Bolshevik Revolution , before leaving again for Paris in 1923. During World War II , he escaped occupied France to the United States, where he lived in New York City for seven years before returning to France in 1948.</td>\n",
       "      <td>{&quot;1&quot;: &quot;Marc Chagall&quot;}</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>Art critic Robert Hughes referred to Chagall as &quot;the quintessential Jewish artist of the twentieth century&quot;. According to art historian Michael J. Lewis, Chagall was considered to be &quot;the last survivor of the first generation of European modernists&quot;. For decades, he &quot;had also been respected as the world&#x27;s pre-eminent Jewish artist&quot;. [15] Using the medium of stained glass, he produced windows for the cathedrals of Reims and Metz as well as the Fraumünster in Zürich , windows for the UN and th ...... e experienced modernism&#x27;s &quot;golden age&quot; in Paris, where &quot;he synthesized the art forms of Cubism , Symbolism , and Fauvism , and the influence of Fauvism gave rise to Surrealism &quot;. Yet throughout these phases of his style &quot;he remained most emphatically a Jewish artist, whose work was one long dreamy reverie of life in his native village of Vitebsk.&quot; [16] &quot;When Matisse dies&quot;, Pablo Picasso remarked in the 1950s, &quot;Chagall will be the only painter left who understands what colour really is&quot;. [17]</td>\n",
       "      <td>{&quot;1&quot;: &quot;Marc Chagall&quot;}</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>Early life and education [ edit ]</td>\n",
       "      <td>{&quot;1&quot;: &quot;Marc Chagall&quot;, &quot;2&quot;: &quot;Early life and education[edit]&quot;}</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>Early life [ edit ] Marc Chagall&#x27;s childhood home in Vitebsk , Belarus. Currently site of the Marc Chagall Museum . Marc Chagall, 1912, The Spoonful of Milk (La Cuillerée de lait) , gouache on paper</td>\n",
       "      <td>{&quot;1&quot;: &quot;Marc Chagall&quot;, &quot;2&quot;: &quot;Early life and education[edit]&quot;, &quot;3&quot;: &quot;Early life[edit]&quot;}</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>Marc Chagall was born Moishe Shagal in 1887, into a Jewish family in Liozna , [1] near the city of Vitebsk , Belarus, then part of the Russian Empire . [c] [18] At the time of his birth, Vitebsk&#x27;s population was about 66,000. Half of the population was Jewish. [16] A picturesque city of churches and synagogues, it was called &quot;Russian Toledo &quot; by artist Ilya Repin , after the cosmopolitan city of the former Spanish Empire . [19] Because the city was built mostly of wood, little of it survived years of occupation and destruction during World War II.</td>\n",
       "      <td>{&quot;1&quot;: &quot;Marc Chagall&quot;, &quot;2&quot;: &quot;Early life and education[edit]&quot;, &quot;3&quot;: &quot;Early life[edit]&quot;}</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>Chagall was the eldest of nine children. The family name, Shagal, is a variant of the name Segal , which in a Jewish community was usually borne by a Levitic family. [20] His father, Khatskl (Zachar) Shagal, was employed by a herring merchant, and his mother, Feige-Ite, sold groceries from their home. His father worked hard, carrying heavy barrels, earning 20 roubles each month (the average wages across the Russian Empire was 13 roubles a month). Chagall wrote of those early years:</td>\n",
       "      <td>{&quot;1&quot;: &quot;Marc Chagall&quot;, &quot;2&quot;: &quot;Early life and education[edit]&quot;, &quot;3&quot;: &quot;Early life[edit]&quot;}</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>Day after day, winter and summer, at six o&#x27;clock in the morning, my father got up and went off to the synagogue. There he said his usual prayer for some dead man or other. On his return he made ready the samovar , drank some tea and went to work. Hellish work, the work of a galley-slave. Why try to hide it? How tell about it? No word will ever ease my father&#x27;s lot... There was always plenty of butter and cheese on our table. Buttered bread, like an eternal symbol, was never out of my childish hands. [21]</td>\n",
       "      <td>{&quot;1&quot;: &quot;Marc Chagall&quot;, &quot;2&quot;: &quot;Early life and education[edit]&quot;, &quot;3&quot;: &quot;Early life[edit]&quot;}</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>One of the main sources of income for the Jewish population of the town was from the manufacture of clothing that was sold throughout the Russian Empire. They also made furniture and various agricultural tools. [22] From the late 18th century to the First World War, the Imperial Russian government confined Jews to living within the Pale of Settlement , which included modern Ukraine, Belarus, Poland, Lithuania, and Latvia, almost exactly corresponding to the territory of the Polish-Lithuanian Commonwealth which was taken over by Imperial Russia in the late 18th century. That led to the creation of Jewish market-villages ( shtetls ) throughout today&#x27;s Eastern Europe, with their own markets, schools, hospitals, and other community institutions. [23] : 14</td>\n",
       "      <td>{&quot;1&quot;: &quot;Marc Chagall&quot;, &quot;2&quot;: &quot;Early life and education[edit]&quot;, &quot;3&quot;: &quot;Early life[edit]&quot;}</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>Chagall wrote as a boy; &quot;I felt at every step that I was a Jew—people made me feel it&quot;. [24] [25] During a pogrom , Chagall wrote that: &quot;The street lamps are out. I feel panicky, especially in front of butchers&#x27; windows. There you can see calves that are still alive lying beside the butchers&#x27; hatchets and knives&quot;. [25] [26] When asked by some pogromniks &quot;Jew or not?&quot;, Chagall remembered thinking: &quot;My pockets are empty, my fingers sensitive, my legs weak and they are out for blood. My death would be futile. I so wanted to live&quot;. [25] [26] Chagall denied being a Jew, leading the pogromniks to shout &quot;All right! Get along!&quot; [25] [26]</td>\n",
       "      <td>{&quot;1&quot;: &quot;Marc Chagall&quot;, &quot;2&quot;: &quot;Early life and education[edit]&quot;, &quot;3&quot;: &quot;Early life[edit]&quot;}</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>Most of what is known about Chagall&#x27;s early life has come from his autobiography, My Life . In it, he described the major influence that the culture of Hasidic Judaism had on his life as an artist. Chagall related how he realised that the Jewish traditions in which he had grown up were fast disappearing and that he needed to document them. From the 1730s, Vitebsk itself had been a centre of that culture, with its teachings derived from the Kabbalah . Chagall scholar, Susan Tumarkin Goodman, describes the links and sources of his art to his early home:</td>\n",
       "      <td>{&quot;1&quot;: &quot;Marc Chagall&quot;, &quot;2&quot;: &quot;Early life and education[edit]&quot;, &quot;3&quot;: &quot;Early life[edit]&quot;}</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>Chagall&#x27;s art can be understood as the response to a situation that has long marked the history of Russian Jews. Though they were cultural innovators who made important contributions to the broader society, Jews were considered outsiders in a frequently hostile society ... Chagall himself was born of a family steeped in religious life; his parents were observant Hasidic Jews who found spiritual satisfaction in a life defined by their faith and organized by prayer. [23] : 14</td>\n",
       "      <td>{&quot;1&quot;: &quot;Marc Chagall&quot;, &quot;2&quot;: &quot;Early life and education[edit]&quot;, &quot;3&quot;: &quot;Early life[edit]&quot;}</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>Art education [ edit ] Portrait of Chagall by Yehuda Pen , his first art teacher in Vitebsk</td>\n",
       "      <td>{&quot;1&quot;: &quot;Marc Chagall&quot;, &quot;2&quot;: &quot;Early life and education[edit]&quot;, &quot;3&quot;: &quot;Art education[edit]&quot;}</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>In the Russian Empire at that time, Jewish children were not allowed to attend regular schools and universities imposed a quota on Jews . Their movement within the city was also restricted. Chagall therefore received his primary education at the local Jewish religious school, where he studied Hebrew and the Bible. At the age of 13, his mother tried to enrol him in a regular high school, and he recalled: &quot;But in that school, they don&#x27;t take Jews. Without a moment&#x27;s hesitation, my courageous mother walks up to a professor.&quot; She offered the headmaster 50 roubles to let him attend, which he accepted. [21]</td>\n",
       "      <td>{&quot;1&quot;: &quot;Marc Chagall&quot;, &quot;2&quot;: &quot;Early life and education[edit]&quot;, &quot;3&quot;: &quot;Art education[edit]&quot;}</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>A turning point of his artistic life came when he first noticed a fellow student drawing. Baal-Teshuva writes that, for the young Chagall, watching someone draw &quot;was like a vision, a revelation in black and white&quot;. Chagall would later say that there was no art of any kind in his family&#x27;s home and the concept was totally alien to him. When Chagall asked the schoolmate how he learned to draw, his friend replied, &quot;Go and find a book in the library, idiot, choose any picture you like, and just copy it&quot;. He soon began copying images from books and found the experience so rewarding he then decided he wanted to become an artist. [22]</td>\n",
       "      <td>{&quot;1&quot;: &quot;Marc Chagall&quot;, &quot;2&quot;: &quot;Early life and education[edit]&quot;, &quot;3&quot;: &quot;Art education[edit]&quot;}</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>Goodman writes that Chagall eventually confided to his mother, &quot;I want to be a painter&quot;, although she could not yet understand his sudden interest in art or why he would choose a vocation that &quot;seemed so impractical&quot;. The young Chagall explained: &quot;There&#x27;s a place in town; if I&#x27;m admitted and if I complete the course, I&#x27;ll come out a regular artist. I&#x27;d be so happy!&quot; It was 1906, and he had noticed the studio of Yehuda (Yuri) Pen , a realist artist who operated a drawing school in Vitebsk. At the same time, future artists El Lissitzky and Ossip Zadkine were also Pen&#x27;s students. Due to Chagall&#x27;s youth and lack of income, Pen offered to teach him free of charge. However, after a few months at the school, Chagall realized that academic portrait painting did not suit him. [22]</td>\n",
       "      <td>{&quot;1&quot;: &quot;Marc Chagall&quot;, &quot;2&quot;: &quot;Early life and education[edit]&quot;, &quot;3&quot;: &quot;Art education[edit]&quot;}</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>Artistic inspiration [ edit ] Marc Chagall, 1912, Calvary ( Golgotha ) , oil on canvas, 174.6 × 192.4 cm, Museum of Modern Art , New York. Alternative titles: Kreuzigung Bild 2 Christus gewidmet [Golgotha. Crucifixion. Dedicated to Christ] . Sold through Galerie Der Sturm (Herwarth Walden), Berlin to Bernhard Koehler (1849–1927), Berlin, 1913. Exhibited: Erster Deutscher Herbstsalon , Berlin, 1913</td>\n",
       "      <td>{&quot;1&quot;: &quot;Marc Chagall&quot;, &quot;2&quot;: &quot;Early life and education[edit]&quot;, &quot;3&quot;: &quot;Artistic inspiration[edit]&quot;}</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>"
      ],
      "text/plain": [
       "                                                 text  \\\n",
       "0   Marc Chagall - Wikipedia Jump to content Searc...   \n",
       "1   Marc Chagall 81 languages Afrikaans Alemannisc...   \n",
       "2   Marc Chagall Chagall, c. 1920 Born Moishe Shag...   \n",
       "3   Marc Chagall [a] (born Moishe Shagal ; 6 July ...   \n",
       "4   Chagall was born in 1887, into a Jewish family...   \n",
       "5   Art critic Robert Hughes referred to Chagall a...   \n",
       "6                   Early life and education [ edit ]   \n",
       "7   Early life [ edit ] Marc Chagall's childhood h...   \n",
       "8   Marc Chagall was born Moishe Shagal in 1887, i...   \n",
       "9   Chagall was the eldest of nine children. The f...   \n",
       "10  Day after day, winter and summer, at six o'clo...   \n",
       "11  One of the main sources of income for the Jewi...   \n",
       "12  Chagall wrote as a boy; \"I felt at every step ...   \n",
       "13  Most of what is known about Chagall's early li...   \n",
       "14  Chagall's art can be understood as the respons...   \n",
       "15  Art education [ edit ] Portrait of Chagall by ...   \n",
       "16  In the Russian Empire at that time, Jewish chi...   \n",
       "17  A turning point of his artistic life came when...   \n",
       "18  Goodman writes that Chagall eventually confide...   \n",
       "19  Artistic inspiration [ edit ] Marc Chagall, 19...   \n",
       "\n",
       "                                              heading  \n",
       "0                                                  {}  \n",
       "1                               {'1': 'Marc Chagall'}  \n",
       "2                               {'1': 'Marc Chagall'}  \n",
       "3                               {'1': 'Marc Chagall'}  \n",
       "4                               {'1': 'Marc Chagall'}  \n",
       "5                               {'1': 'Marc Chagall'}  \n",
       "6   {'1': 'Marc Chagall', '2': 'Early life and edu...  \n",
       "7   {'1': 'Marc Chagall', '2': 'Early life and edu...  \n",
       "8   {'1': 'Marc Chagall', '2': 'Early life and edu...  \n",
       "9   {'1': 'Marc Chagall', '2': 'Early life and edu...  \n",
       "10  {'1': 'Marc Chagall', '2': 'Early life and edu...  \n",
       "11  {'1': 'Marc Chagall', '2': 'Early life and edu...  \n",
       "12  {'1': 'Marc Chagall', '2': 'Early life and edu...  \n",
       "13  {'1': 'Marc Chagall', '2': 'Early life and edu...  \n",
       "14  {'1': 'Marc Chagall', '2': 'Early life and edu...  \n",
       "15  {'1': 'Marc Chagall', '2': 'Early life and edu...  \n",
       "16  {'1': 'Marc Chagall', '2': 'Early life and edu...  \n",
       "17  {'1': 'Marc Chagall', '2': 'Early life and edu...  \n",
       "18  {'1': 'Marc Chagall', '2': 'Early life and edu...  \n",
       "19  {'1': 'Marc Chagall', '2': 'Early life and edu...  "
      ]
     },
     "execution_count": 12,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "chunks.select(chunks.text, chunks.heading).show(20)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "id": "c0d453cf",
   "metadata": {
    "scrolled": true
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th>text</th>\n",
       "      <th>heading</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <td>Marc Chagall - Wikipedia Jump to content Search Search</td>\n",
       "      <td>{}</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>Marc Chagall 81 languages Afrikaans Alemannisch العربية Aragonés Արեւմտահայերէն Asturianu Azərbaycanca বাংলা Башҡ</td>\n",
       "      <td>{&quot;1&quot;: &quot;Marc Chagall&quot;}</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>ортса Беларуская Беларуская (тарашкевіца) Български Català Čeština Cymraeg Dansk Deutsch Eesti Ελληνικά Español Esperanto Euskara فارسی Français Galego 한</td>\n",
       "      <td>{&quot;1&quot;: &quot;Marc Chagall&quot;}</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>국어 Հայերեն हिन्दी Hrvatski Ido Bahasa Indonesia Interlingua Italiano עברית Jawa ქართული Kiswahili Latina Latviešu Lë</td>\n",
       "      <td>{&quot;1&quot;: &quot;Marc Chagall&quot;}</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>tzebuergesch Lietuvių Magyar Македонски Malagasy مصرى Nederlands Nedersaksies 日本語 Norsk bokmål Norsk nynorsk Occitan Oʻzbekcha / ўзбекча پنجابی Picard Piemont</td>\n",
       "      <td>{&quot;1&quot;: &quot;Marc Chagall&quot;}</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>èis Plattdüütsch Polski Português Română Runa Simi Русский Scots Shqip Sicilianu Simple English Slovenčina Slovenščina کوردی Српски / srpski Srpskohrvatski / српскохрватски Suomi</td>\n",
       "      <td>{&quot;1&quot;: &quot;Marc Chagall&quot;}</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>Svenska ไทย Türkçe Українська Tiếng Việt Winaray 吴语 ייִדיש 粵語 中文 Edit links From Wikipedia, the free encyclopedia Russian-French artist (1887–1985) &quot;Chagall&quot; redirects here</td>\n",
       "      <td>{&quot;1&quot;: &quot;Marc Chagall&quot;}</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>. For other uses, see Chagall (disambiguation) .</td>\n",
       "      <td>{&quot;1&quot;: &quot;Marc Chagall&quot;}</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>Marc Chagall Chagall, c. 1920 Born Moishe Shagal ( 1887-07-06 ) 6 July 1887 (N.S.) Liozna , Vitebsk Governorate , Russian Empire (now Belarus) [1] Died 28 March 1985 (1985-03-</td>\n",
       "      <td>{&quot;1&quot;: &quot;Marc Chagall&quot;}</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>28) (aged 97) Saint-Paul-de-Vence , France Nationality Russian Empire, later French [2] Known for Painting stained glass Notable work See list of artworks by Marc Chagall Movement Cubism Expressionism School of Paris Spouses Bella Rosenfeld ​ ​ ( m. 1915; died 1944) ​ Val</td>\n",
       "      <td>{&quot;1&quot;: &quot;Marc Chagall&quot;}</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>entina (Vava) Brodsky ​ ​ ( m. 1952) ​ [3] Children 2 [4]</td>\n",
       "      <td>{&quot;1&quot;: &quot;Marc Chagall&quot;}</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>Marc Chagall [a] (born Moishe Shagal ; 6 July [ O.S. 24 June] 1887 – 28 March 1985) was a Belarusian-French artist. [b] An early modernist , he was associated with the École de Paris as well as several major artistic styles and created</td>\n",
       "      <td>{&quot;1&quot;: &quot;Marc Chagall&quot;}</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>works in a wide range of artistic formats, including painting, drawings, book illustrations, stained glass , stage sets, ceramics, tapestries and fine art prints.</td>\n",
       "      <td>{&quot;1&quot;: &quot;Marc Chagall&quot;}</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>Chagall was born in 1887, into a Jewish family near Vitebsk , today in Belarus , but at that time in the Pale of Settlement of the Russian Empire. Before World War I , he travelled between Saint Petersburg , Paris , and Berlin . During that period, he created his own mixture and style of modern art, based on his</td>\n",
       "      <td>{&quot;1&quot;: &quot;Marc Chagall&quot;}</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>ideas of Eastern European and Jewish folklore. He spent the wartime years in his native Belarus, becoming one of the country&#x27;s most distinguished artists and a member of the modernist avant-garde , founding the Vitebsk Arts College . He later worked in and near Moscow in difficult conditions during hard times in Russia following the Bolshevik Revolution , before leaving again for Paris</td>\n",
       "      <td>{&quot;1&quot;: &quot;Marc Chagall&quot;}</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>in 1923. During World War II , he escaped occupied France to the United States, where he lived in New York City for seven years before returning to France in 1948.</td>\n",
       "      <td>{&quot;1&quot;: &quot;Marc Chagall&quot;}</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>Art critic Robert Hughes referred to Chagall as &quot;the quintessential Jewish artist of the twentieth century&quot;. According to art historian Michael J. Lewis, Chagall was considered to be &quot;the last survivor of the first generation of European modernists&quot;. For decades, he &quot;had also been respected as the world&#x27;s pre-eminent Jewish artist&quot;. [15]</td>\n",
       "      <td>{&quot;1&quot;: &quot;Marc Chagall&quot;}</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>Using the medium of stained glass, he produced windows for the cathedrals of Reims and Metz as well as the Fraumünster in Zürich , windows for the UN and the Art Institute of Chicago and the Jerusalem Windows in Israel. He also did large-scale paintings, including part of the ceiling of the Paris Opéra . He experienced</td>\n",
       "      <td>{&quot;1&quot;: &quot;Marc Chagall&quot;}</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>modernism&#x27;s &quot;golden age&quot; in Paris, where &quot;he synthesized the art forms of Cubism , Symbolism , and Fauvism , and the influence of Fauvism gave rise to Surrealism &quot;. Yet throughout these phases of his style &quot;he remained most emphatically a Jewish artist, whose work was one long dreamy reverie of</td>\n",
       "      <td>{&quot;1&quot;: &quot;Marc Chagall&quot;}</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>life in his native village of Vitebsk.&quot; [16] &quot;When Matisse dies&quot;, Pablo Picasso remarked in the 1950s, &quot;Chagall will be the only painter left who understands what colour really is&quot;. [17]</td>\n",
       "      <td>{&quot;1&quot;: &quot;Marc Chagall&quot;}</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>"
      ],
      "text/plain": [
       "                                                 text                heading\n",
       "0   Marc Chagall - Wikipedia Jump to content Searc...                     {}\n",
       "1   Marc Chagall 81 languages Afrikaans Alemannisc...  {'1': 'Marc Chagall'}\n",
       "2   ортса Беларуская Беларуская (тарашкевіца) Бълг...  {'1': 'Marc Chagall'}\n",
       "3   국어 Հայերեն हिन्दी Hrvatski Ido Bahasa Indonesi...  {'1': 'Marc Chagall'}\n",
       "4   tzebuergesch Lietuvių Magyar Македонски Malaga...  {'1': 'Marc Chagall'}\n",
       "5   èis Plattdüütsch Polski Português Română Runa ...  {'1': 'Marc Chagall'}\n",
       "6    Svenska ไทย Türkçe Українська Tiếng Việt Wina...  {'1': 'Marc Chagall'}\n",
       "7    . For other uses, see Chagall (disambiguation) .  {'1': 'Marc Chagall'}\n",
       "8   Marc Chagall Chagall, c. 1920 Born Moishe Shag...  {'1': 'Marc Chagall'}\n",
       "9   28) (aged 97) Saint-Paul-de-Vence , France Nat...  {'1': 'Marc Chagall'}\n",
       "10  entina (Vava) Brodsky ​ ​ ( m. 1952) ​ [3] Chi...  {'1': 'Marc Chagall'}\n",
       "11  Marc Chagall [a] (born Moishe Shagal ; 6 July ...  {'1': 'Marc Chagall'}\n",
       "12   works in a wide range of artistic formats, in...  {'1': 'Marc Chagall'}\n",
       "13  Chagall was born in 1887, into a Jewish family...  {'1': 'Marc Chagall'}\n",
       "14   ideas of Eastern European and Jewish folklore...  {'1': 'Marc Chagall'}\n",
       "15   in 1923. During World War II , he escaped occ...  {'1': 'Marc Chagall'}\n",
       "16  Art critic Robert Hughes referred to Chagall a...  {'1': 'Marc Chagall'}\n",
       "17   Using the medium of stained glass, he produce...  {'1': 'Marc Chagall'}\n",
       "18   modernism's \"golden age\" in Paris, where \"he ...  {'1': 'Marc Chagall'}\n",
       "19   life in his native village of Vitebsk.\" [16] ...  {'1': 'Marc Chagall'}"
      ]
     },
     "execution_count": 13,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "short_chunks.select(short_chunks.text, short_chunks.heading).show(20)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "id": "bb32ade8-9727-4ebc-b57a-ff8097aca993",
   "metadata": {
    "scrolled": true
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th>text</th>\n",
       "      <th>heading</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <td>Marc Chagall - Wikipedia Jump to content Search Search</td>\n",
       "      <td>{}</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>Marc Chagall 81 languages Afrikaans Alemannisch العربية Aragonés Արեւմտա</td>\n",
       "      <td>{&quot;1&quot;: &quot;Marc Chagall&quot;}</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>հայերէն Asturianu Azərbaycanca বাংলা Башҡортса Беларуская Беларуская (та</td>\n",
       "      <td>{&quot;1&quot;: &quot;Marc Chagall&quot;}</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>рашкевіца) Български Català Čeština Cymraeg Dansk Deutsch Eesti Ελληνικά</td>\n",
       "      <td>{&quot;1&quot;: &quot;Marc Chagall&quot;}</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>Español Esperanto Euskara فارسی Français Galego 한국어 Հայերեն हिन्दी Hrva</td>\n",
       "      <td>{&quot;1&quot;: &quot;Marc Chagall&quot;}</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>tski Ido Bahasa Indonesia Interlingua Italiano עברית Jawa ქართული Kiswah</td>\n",
       "      <td>{&quot;1&quot;: &quot;Marc Chagall&quot;}</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>ili Latina Latviešu Lëtzebuergesch Lietuvių Magyar Македонски Malagasy م</td>\n",
       "      <td>{&quot;1&quot;: &quot;Marc Chagall&quot;}</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>صرى Nederlands Nedersaksies 日本語 Norsk bokmål Norsk nynorsk Occitan Oʻzbe</td>\n",
       "      <td>{&quot;1&quot;: &quot;Marc Chagall&quot;}</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>kcha / ўзбекча پنجابی Picard Piemontèis Plattdüütsch Polski Português Ro</td>\n",
       "      <td>{&quot;1&quot;: &quot;Marc Chagall&quot;}</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>mână Runa Simi Русский Scots Shqip Sicilianu Simple English Slovenčina S</td>\n",
       "      <td>{&quot;1&quot;: &quot;Marc Chagall&quot;}</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>lovenščina کوردی Српски / srpski Srpskohrvatski / српскохрватски Suomi S</td>\n",
       "      <td>{&quot;1&quot;: &quot;Marc Chagall&quot;}</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>venska ไทย Türkçe Українська Tiếng Việt Winaray 吴语 ייִדיש 粵語 中文 Edit lin</td>\n",
       "      <td>{&quot;1&quot;: &quot;Marc Chagall&quot;}</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>ks From Wikipedia, the free encyclopedia Russian-French artist (1887–198</td>\n",
       "      <td>{&quot;1&quot;: &quot;Marc Chagall&quot;}</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>5) &quot;Chagall&quot; redirects here. For other uses, see Chagall (disambiguation</td>\n",
       "      <td>{&quot;1&quot;: &quot;Marc Chagall&quot;}</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>) .</td>\n",
       "      <td>{&quot;1&quot;: &quot;Marc Chagall&quot;}</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>Marc Chagall Chagall, c. 1920 Born Moishe Shagal ( 1887-07-06 ) 6 July 1</td>\n",
       "      <td>{&quot;1&quot;: &quot;Marc Chagall&quot;}</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>887 (N.S.) Liozna , Vitebsk Governorate , Russian Empire (now Belarus) [</td>\n",
       "      <td>{&quot;1&quot;: &quot;Marc Chagall&quot;}</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>1] Died 28 March 1985 (1985-03-28) (aged 97) Saint-Paul-de-Vence , Franc</td>\n",
       "      <td>{&quot;1&quot;: &quot;Marc Chagall&quot;}</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>e Nationality Russian Empire, later French [2] Known for Painting staine</td>\n",
       "      <td>{&quot;1&quot;: &quot;Marc Chagall&quot;}</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>d glass Notable work See list of artworks by Marc Chagall Movement Cubis</td>\n",
       "      <td>{&quot;1&quot;: &quot;Marc Chagall&quot;}</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>"
      ],
      "text/plain": [
       "                                                 text                heading\n",
       "0   Marc Chagall - Wikipedia Jump to content Searc...                     {}\n",
       "1   Marc Chagall 81 languages Afrikaans Alemannisc...  {'1': 'Marc Chagall'}\n",
       "2   հայերէն Asturianu Azərbaycanca বাংলা Башҡортса...  {'1': 'Marc Chagall'}\n",
       "3   рашкевіца) Български Català Čeština Cymraeg Da...  {'1': 'Marc Chagall'}\n",
       "4    Español Esperanto Euskara فارسی Français Gale...  {'1': 'Marc Chagall'}\n",
       "5   tski Ido Bahasa Indonesia Interlingua Italiano...  {'1': 'Marc Chagall'}\n",
       "6   ili Latina Latviešu Lëtzebuergesch Lietuvių Ma...  {'1': 'Marc Chagall'}\n",
       "7   صرى Nederlands Nedersaksies 日本語 Norsk bokmål N...  {'1': 'Marc Chagall'}\n",
       "8   kcha / ўзбекча پنجابی Picard Piemontèis Plattd...  {'1': 'Marc Chagall'}\n",
       "9   mână Runa Simi Русский Scots Shqip Sicilianu S...  {'1': 'Marc Chagall'}\n",
       "10  lovenščina کوردی Српски / srpski Srpskohrvatsk...  {'1': 'Marc Chagall'}\n",
       "11  venska ไทย Türkçe Українська Tiếng Việt Winara...  {'1': 'Marc Chagall'}\n",
       "12  ks From Wikipedia, the free encyclopedia Russi...  {'1': 'Marc Chagall'}\n",
       "13  5) \"Chagall\" redirects here. For other uses, s...  {'1': 'Marc Chagall'}\n",
       "14                                                ) .  {'1': 'Marc Chagall'}\n",
       "15  Marc Chagall Chagall, c. 1920 Born Moishe Shag...  {'1': 'Marc Chagall'}\n",
       "16  887 (N.S.) Liozna , Vitebsk Governorate , Russ...  {'1': 'Marc Chagall'}\n",
       "17  1] Died 28 March 1985 (1985-03-28) (aged 97) S...  {'1': 'Marc Chagall'}\n",
       "18  e Nationality Russian Empire, later French [2]...  {'1': 'Marc Chagall'}\n",
       "19  d glass Notable work See list of artworks by M...  {'1': 'Marc Chagall'}"
      ]
     },
     "execution_count": 14,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "short_char_chunks.select(short_char_chunks.text, short_char_chunks.heading).show(20)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "556d86ab-ef09-43e6-8bff-95caf8b8e45d",
   "metadata": {},
   "source": [
    "Now let's add a few more documents to our workflow. Notice how all of the downstream views are updated incrementally, processing just the new documents as they are inserted."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "id": "71735d6a-3f99-46ef-99d7-4783aa6840cb",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Inserting rows into `docs`: 3 rows [00:00, 2676.08 rows/s]\n",
      "Inserting rows into `sentences`: 2106 rows [00:03, 616.57 rows/s]\n",
      "Inserting rows into `chunks`: 276 rows [00:00, 11028.49 rows/s]\n",
      "Inserting rows into `short_chunks`: 812 rows [00:00, 15100.27 rows/s]\n",
      "Inserting rows into `short_char_chunks`: 2638 rows [00:00, 6835.79 rows/s]\n",
      "Inserted 5835 rows with 0 errors.\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "UpdateStatus(num_rows=5835, num_computed_values=0, num_excs=0, updated_cols=[], cols_with_excs=[])"
      ]
     },
     "execution_count": 15,
     "metadata": {},
     "output_type": "execute_result"
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\r",
      "Inserting rows into `chunks`: 0 rows [00:00, ? rows/s]"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\r",
      "Inserting rows into `chunks`: 276 rows [00:00, 16361.31 rows/s]"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\r",
      "Inserting rows into `short_chunks`: 0 rows [00:00, ? rows/s]"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\r",
      "Inserting rows into `short_chunks`: 811 rows [00:00, 20491.70 rows/s]"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\r",
      "Inserting rows into `short_char_chunks`: 0 rows [00:00, ? rows/s]"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\r",
      "Inserting rows into `short_char_chunks`: 2636 rows [00:00, 5624.57 rows/s]"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\n",
      "Inserted 5831 rows with 0 errors.\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "UpdateStatus(num_rows=5831, num_computed_values=0, num_excs=0, updated_cols=[], cols_with_excs=[])"
      ]
     },
     "execution_count": 16,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "urls = [\n",
    "    'https://en.wikipedia.org/wiki/Pierre-Auguste_Renoir',\n",
    "    'https://en.wikipedia.org/wiki/Henri_Matisse',\n",
    "    'https://en.wikipedia.org/wiki/Marcel_Duchamp'\n",
    "]\n",
    "docs.insert({'source_doc': url} for url in urls)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "bdf1e13c-b038-40ee-b48c-35d514149c62",
   "metadata": {},
   "source": [
    "## Further Experiments\n",
    "\n",
    "This is a good time to mention another important guiding principle of Pixeltable. The preceding examples all used the built-in `DocumentSplitter` class with various configurations. That's probably fine as a first cut or to prototype an application quickly, and it might be sufficient for some applications. But other applications might want to do more sophisticated kinds of chunking, implementing their own specialized logic or leveraging third-party tools. Pixeltable imposes no constraints on the AI or RAG operations a workflow uses: the iterator interface is highly general, and it's easy to implement new operations or adapt existing code or third-party tools into the Pixeltable workflow."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "709a4be2",
   "metadata": {},
   "source": [
    "## Computing Embeddings"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "fa455441-bfd1-41cc-a802-a396a2323c4f",
   "metadata": {},
   "source": [
    "Next, let's look at how embedding indices can be added seamlessly to existing Pixeltable workflows. To compute our embeddings, we'll use the Huggingface `sentence_transformer` package, running it over the `chunks` view that broke our documents up into larger paragraphs. Pixeltable has a built-in `sentence_transformer` adapter, and all we have to do is add a new column that leverages it. Pixeltable takes care of the rest, applying the new column to all existing data in the view."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "id": "e0d7ea4a",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Computing cells: 100%|███████████████████████████████████████| 481/481 [00:01<00:00, 246.79 cells/s]\n",
      "Added 481 column values with 0 errors.\n"
     ]
    }
   ],
   "source": [
    "from pixeltable.functions.huggingface import sentence_transformer\n",
    "\n",
    "chunks['minilm_embed'] =sentence_transformer(chunks.text, model_id='paraphrase-MiniLM-L6-v2')"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "65bb3b1b-04fd-422b-a070-d3faa16033e0",
   "metadata": {},
   "source": [
    "The new column is a *computed column*: it is defined as a function on top of existing data and updated incrementally as new data are added to the workflow. Let's have a look at how the new column affected the `chunks` view."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "id": "3cf786cc",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<style type=\"text/css\">\n",
       "#T_de612 th {\n",
       "  text-align: center;\n",
       "}\n",
       "#T_de612_row0_col0, #T_de612_row0_col1, #T_de612_row0_col2, #T_de612_row1_col0, #T_de612_row1_col1, #T_de612_row1_col2, #T_de612_row2_col0, #T_de612_row2_col1, #T_de612_row2_col2, #T_de612_row3_col0, #T_de612_row3_col1, #T_de612_row3_col2, #T_de612_row4_col0, #T_de612_row4_col1, #T_de612_row4_col2, #T_de612_row5_col0, #T_de612_row5_col1, #T_de612_row5_col2, #T_de612_row6_col0, #T_de612_row6_col1, #T_de612_row6_col2 {\n",
       "  white-space: pre-wrap;\n",
       "  text-align: left;\n",
       "}\n",
       "</style>\n",
       "<table id=\"T_de612\">\n",
       "  <thead>\n",
       "    <tr>\n",
       "      <th id=\"T_de612_level0_col0\" class=\"col_heading level0 col0\" >Column Name</th>\n",
       "      <th id=\"T_de612_level0_col1\" class=\"col_heading level0 col1\" >Type</th>\n",
       "      <th id=\"T_de612_level0_col2\" class=\"col_heading level0 col2\" >Computed With</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <td id=\"T_de612_row0_col0\" class=\"data row0 col0\" >pos</td>\n",
       "      <td id=\"T_de612_row0_col1\" class=\"data row0 col1\" >int</td>\n",
       "      <td id=\"T_de612_row0_col2\" class=\"data row0 col2\" ></td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td id=\"T_de612_row1_col0\" class=\"data row1 col0\" >text</td>\n",
       "      <td id=\"T_de612_row1_col1\" class=\"data row1 col1\" >string</td>\n",
       "      <td id=\"T_de612_row1_col2\" class=\"data row1 col2\" ></td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td id=\"T_de612_row2_col0\" class=\"data row2 col0\" >title</td>\n",
       "      <td id=\"T_de612_row2_col1\" class=\"data row2 col1\" >string</td>\n",
       "      <td id=\"T_de612_row2_col2\" class=\"data row2 col2\" ></td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td id=\"T_de612_row3_col0\" class=\"data row3 col0\" >heading</td>\n",
       "      <td id=\"T_de612_row3_col1\" class=\"data row3 col1\" >json</td>\n",
       "      <td id=\"T_de612_row3_col2\" class=\"data row3 col2\" ></td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td id=\"T_de612_row4_col0\" class=\"data row4 col0\" >sourceline</td>\n",
       "      <td id=\"T_de612_row4_col1\" class=\"data row4 col1\" >int</td>\n",
       "      <td id=\"T_de612_row4_col2\" class=\"data row4 col2\" ></td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td id=\"T_de612_row5_col0\" class=\"data row5 col0\" >minilm_embed</td>\n",
       "      <td id=\"T_de612_row5_col1\" class=\"data row5 col1\" >array((384,), dtype=FLOAT)</td>\n",
       "      <td id=\"T_de612_row5_col2\" class=\"data row5 col2\" >sentence_transformer(text, model_id='paraphrase-MiniLM-L6-v2')</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td id=\"T_de612_row6_col0\" class=\"data row6 col0\" >source_doc</td>\n",
       "      <td id=\"T_de612_row6_col1\" class=\"data row6 col1\" >document</td>\n",
       "      <td id=\"T_de612_row6_col2\" class=\"data row6 col2\" ></td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n"
      ],
      "text/plain": [
       "view 'chunks'\n",
       "\n",
       " Column Name                       Type                                                  Computed With\n",
       "         pos                        int                                                               \n",
       "        text                     string                                                               \n",
       "       title                     string                                                               \n",
       "     heading                       json                                                               \n",
       "  sourceline                        int                                                               \n",
       "minilm_embed array((384,), dtype=FLOAT) sentence_transformer(text, model_id='paraphrase-MiniLM-L6-v2')\n",
       "  source_doc                   document                                                               "
      ]
     },
     "execution_count": 17,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "chunks"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "id": "bc2893b5",
   "metadata": {
    "scrolled": true
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th>text</th>\n",
       "      <th>heading</th>\n",
       "      <th>minilm_embed</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <td>Marc Chagall - Wikipedia Jump to content Search Search</td>\n",
       "      <td>{}</td>\n",
       "      <td>[-0.262 -0.119 -0.133  0.048  0.12  -0.006 ... -0.556  0.372  0.468 -0.234 -0.226  0.164]</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>Marc Chagall 81 languages Afrikaans Alemannisch العربية Aragonés Արեւմտահայերէն Asturianu Azərbaycanca বাংলা Башҡортса Беларуская Беларуская (тарашкевіца) Български Català Čeština Cymraeg Dansk Deutsch Eesti Ελληνικά Español Esperanto Euskara فارسی Français Galego 한국어 Հայերեն हिन्दी Hrvatski Ido Bahasa Indonesia Interlingua Italiano עברית Jawa ქართული Kiswahili Latina Latviešu Lëtzebuergesch Lietuvių Magyar Македонски Malagasy مصرى Nederlands Nedersaksies 日本語 Norsk bokmål Norsk nynorsk Occitan Oʻzbekcha / ўзбекча پنجابی Picard Piemontèis Plattdüütsch Polski Português Română Runa Simi Русский Scots Shqip Sicilianu Simple English Slovenčina Slovenščina کوردی Српски / srpski Srpskohrvatski / српскохрватски Suomi Svenska ไทย Türkçe Українська Tiếng Việt Winaray 吴语 ייִדיש 粵語 中文 Edit links From Wikipedia, the free encyclopedia Russian-French artist (1887–1985) &quot;Chagall&quot; redirects here. For other uses, see Chagall (disambiguation) .</td>\n",
       "      <td>{&quot;1&quot;: &quot;Marc Chagall&quot;}</td>\n",
       "      <td>[-0.136  0.401 -0.53  -0.181 -0.453 -0.125 ... -0.184  0.122  0.644 -0.54   0.188  0.203]</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>Marc Chagall Chagall, c. 1920 Born Moishe Shagal ( 1887-07-06 ) 6 July 1887 (N.S.) Liozna , Vitebsk Governorate , Russian Empire (now Belarus) [1] Died 28 March 1985 (1985-03-28) (aged 97) Saint-Paul-de-Vence , France Nationality Russian Empire, later French [2] Known for Painting stained glass Notable work See list of artworks by Marc Chagall Movement Cubism Expressionism School of Paris Spouses Bella Rosenfeld ​ ​ ( m. 1915; died 1944) ​ Valentina (Vava) Brodsky ​ ​ ( m. 1952) ​ [3] Children 2 [4]</td>\n",
       "      <td>{&quot;1&quot;: &quot;Marc Chagall&quot;}</td>\n",
       "      <td>[-0.005  0.34  -0.315  0.17  -0.124  0.384 ... -0.144 -0.131  0.104 -0.412 -0.195 -0.058]</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>Marc Chagall [a] (born Moishe Shagal ; 6 July [ O.S. 24 June] 1887 – 28 March 1985) was a Belarusian-French artist. [b] An early modernist , he was associated with the École de Paris as well as several major artistic styles and created works in a wide range of artistic formats, including painting, drawings, book illustrations, stained glass , stage sets, ceramics, tapestries and fine art prints.</td>\n",
       "      <td>{&quot;1&quot;: &quot;Marc Chagall&quot;}</td>\n",
       "      <td>[ 0.053  0.138 -0.219  0.192 -0.1    0.234 ... -0.138 -0.294  0.306 -0.012 -0.059  0.007]</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>Chagall was born in 1887, into a Jewish family near Vitebsk , today in Belarus , but at that time in the Pale of Settlement of the Russian Empire. Before World War I , he travelled between Saint Petersburg , Paris , and Berlin . During that period, he created his own mixture and style of modern art, based on his ideas of Eastern European and Jewish folklore. He spent the wartime years in his native Belarus, becoming one of the country&#x27;s most distinguished artists and a member of the modernist avant-garde , founding the Vitebsk Arts College . He later worked in and near Moscow in difficult conditions during hard times in Russia following the Bolshevik Revolution , before leaving again for Paris in 1923. During World War II , he escaped occupied France to the United States, where he lived in New York City for seven years before returning to France in 1948.</td>\n",
       "      <td>{&quot;1&quot;: &quot;Marc Chagall&quot;}</td>\n",
       "      <td>[ 0.013  0.248 -0.692  0.143 -0.379  0.254 ... -0.232 -0.157 -0.018 -0.225 -0.208 -0.095]</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>Art critic Robert Hughes referred to Chagall as &quot;the quintessential Jewish artist of the twentieth century&quot;. According to art historian Michael J. Lewis, Chagall was considered to be &quot;the last survivor of the first generation of European modernists&quot;. For decades, he &quot;had also been respected as the world&#x27;s pre-eminent Jewish artist&quot;. [15] Using the medium of stained glass, he produced windows for the cathedrals of Reims and Metz as well as the Fraumünster in Zürich , windows for the UN and th ...... e experienced modernism&#x27;s &quot;golden age&quot; in Paris, where &quot;he synthesized the art forms of Cubism , Symbolism , and Fauvism , and the influence of Fauvism gave rise to Surrealism &quot;. Yet throughout these phases of his style &quot;he remained most emphatically a Jewish artist, whose work was one long dreamy reverie of life in his native village of Vitebsk.&quot; [16] &quot;When Matisse dies&quot;, Pablo Picasso remarked in the 1950s, &quot;Chagall will be the only painter left who understands what colour really is&quot;. [17]</td>\n",
       "      <td>{&quot;1&quot;: &quot;Marc Chagall&quot;}</td>\n",
       "      <td>[-0.172  0.348 -0.307  0.034 -0.071  0.111 ... -0.31  -0.011  0.302 -0.273 -0.163  0.152]</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>Early life and education [ edit ]</td>\n",
       "      <td>{&quot;1&quot;: &quot;Marc Chagall&quot;, &quot;2&quot;: &quot;Early life and education[edit]&quot;}</td>\n",
       "      <td>[-0.213  0.418  0.094  0.135 -0.069  0.265 ... -0.548  0.164  0.075  0.205  0.309  0.277]</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>Early life [ edit ] Marc Chagall&#x27;s childhood home in Vitebsk , Belarus. Currently site of the Marc Chagall Museum . Marc Chagall, 1912, The Spoonful of Milk (La Cuillerée de lait) , gouache on paper</td>\n",
       "      <td>{&quot;1&quot;: &quot;Marc Chagall&quot;, &quot;2&quot;: &quot;Early life and education[edit]&quot;, &quot;3&quot;: &quot;Early life[edit]&quot;}</td>\n",
       "      <td>[-0.04   0.143 -0.357  0.412 -0.331  0.201 ... -0.006 -0.057  0.255  0.181  0.018 -0.021]</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>Marc Chagall was born Moishe Shagal in 1887, into a Jewish family in Liozna , [1] near the city of Vitebsk , Belarus, then part of the Russian Empire . [c] [18] At the time of his birth, Vitebsk&#x27;s population was about 66,000. Half of the population was Jewish. [16] A picturesque city of churches and synagogues, it was called &quot;Russian Toledo &quot; by artist Ilya Repin , after the cosmopolitan city of the former Spanish Empire . [19] Because the city was built mostly of wood, little of it survived years of occupation and destruction during World War II.</td>\n",
       "      <td>{&quot;1&quot;: &quot;Marc Chagall&quot;, &quot;2&quot;: &quot;Early life and education[edit]&quot;, &quot;3&quot;: &quot;Early life[edit]&quot;}</td>\n",
       "      <td>[ 0.123  0.198 -0.496  0.154 -0.368  0.078 ... -0.057 -0.141 -0.063 -0.096 -0.136 -0.232]</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>Chagall was the eldest of nine children. The family name, Shagal, is a variant of the name Segal , which in a Jewish community was usually borne by a Levitic family. [20] His father, Khatskl (Zachar) Shagal, was employed by a herring merchant, and his mother, Feige-Ite, sold groceries from their home. His father worked hard, carrying heavy barrels, earning 20 roubles each month (the average wages across the Russian Empire was 13 roubles a month). Chagall wrote of those early years:</td>\n",
       "      <td>{&quot;1&quot;: &quot;Marc Chagall&quot;, &quot;2&quot;: &quot;Early life and education[edit]&quot;, &quot;3&quot;: &quot;Early life[edit]&quot;}</td>\n",
       "      <td>[-0.19   0.266 -0.4    0.129 -0.493  0.063 ... -0.194 -0.2    0.322  0.024 -0.068  0.031]</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>"
      ],
      "text/plain": [
       "                                                text  \\\n",
       "0  Marc Chagall - Wikipedia Jump to content Searc...   \n",
       "1  Marc Chagall 81 languages Afrikaans Alemannisc...   \n",
       "2  Marc Chagall Chagall, c. 1920 Born Moishe Shag...   \n",
       "3  Marc Chagall [a] (born Moishe Shagal ; 6 July ...   \n",
       "4  Chagall was born in 1887, into a Jewish family...   \n",
       "5  Art critic Robert Hughes referred to Chagall a...   \n",
       "6                  Early life and education [ edit ]   \n",
       "7  Early life [ edit ] Marc Chagall's childhood h...   \n",
       "8  Marc Chagall was born Moishe Shagal in 1887, i...   \n",
       "9  Chagall was the eldest of nine children. The f...   \n",
       "\n",
       "                                             heading  \\\n",
       "0                                                 {}   \n",
       "1                              {'1': 'Marc Chagall'}   \n",
       "2                              {'1': 'Marc Chagall'}   \n",
       "3                              {'1': 'Marc Chagall'}   \n",
       "4                              {'1': 'Marc Chagall'}   \n",
       "5                              {'1': 'Marc Chagall'}   \n",
       "6  {'1': 'Marc Chagall', '2': 'Early life and edu...   \n",
       "7  {'1': 'Marc Chagall', '2': 'Early life and edu...   \n",
       "8  {'1': 'Marc Chagall', '2': 'Early life and edu...   \n",
       "9  {'1': 'Marc Chagall', '2': 'Early life and edu...   \n",
       "\n",
       "                                        minilm_embed  \n",
       "0  [-0.2623971, -0.11875597, -0.1327094, 0.048251...  \n",
       "1  [-0.13631284, 0.40063256, -0.5300299, -0.18143...  \n",
       "2  [-0.0047689965, 0.33990884, -0.3152904, 0.1701...  \n",
       "3  [0.052763388, 0.13830872, -0.21864271, 0.19172...  \n",
       "4  [0.0128892455, 0.24784817, -0.69244295, 0.1426...  \n",
       "5  [-0.17184898, 0.34802842, -0.30670404, 0.03375...  \n",
       "6  [-0.21258691, 0.4176262, 0.09400387, 0.1349991...  \n",
       "7  [-0.04040356, 0.1428114, -0.3568075, 0.4118173...  \n",
       "8  [0.12289606, 0.19771104, -0.4960996, 0.1543681...  \n",
       "9  [-0.19016075, 0.26621026, -0.4000805, 0.129193...  "
      ]
     },
     "execution_count": 18,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "chunks.select(chunks.text, chunks.heading, chunks.minilm_embed).head()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "258f0b29-aa1b-4292-8eb0-9dd3cd89d53b",
   "metadata": {},
   "source": [
    "Similarly, we might want to add a CLIP embedding to our workflow; once again, it's just another computed column:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "id": "bc4811c4",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Computing cells: 100%|███████████████████████████████████████| 481/481 [00:03<00:00, 137.11 cells/s]\n",
      "Added 481 column values with 0 errors.\n"
     ]
    }
   ],
   "source": [
    "from pixeltable.functions.huggingface import clip_text\n",
    "\n",
    "chunks['clip_embed'] = clip_text(chunks.text, model_id='openai/clip-vit-base-patch32')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "id": "40f446a4",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<style type=\"text/css\">\n",
       "#T_daae5 th {\n",
       "  text-align: center;\n",
       "}\n",
       "#T_daae5_row0_col0, #T_daae5_row0_col1, #T_daae5_row0_col2, #T_daae5_row1_col0, #T_daae5_row1_col1, #T_daae5_row1_col2, #T_daae5_row2_col0, #T_daae5_row2_col1, #T_daae5_row2_col2, #T_daae5_row3_col0, #T_daae5_row3_col1, #T_daae5_row3_col2, #T_daae5_row4_col0, #T_daae5_row4_col1, #T_daae5_row4_col2, #T_daae5_row5_col0, #T_daae5_row5_col1, #T_daae5_row5_col2, #T_daae5_row6_col0, #T_daae5_row6_col1, #T_daae5_row6_col2, #T_daae5_row7_col0, #T_daae5_row7_col1, #T_daae5_row7_col2 {\n",
       "  white-space: pre-wrap;\n",
       "  text-align: left;\n",
       "}\n",
       "</style>\n",
       "<table id=\"T_daae5\">\n",
       "  <thead>\n",
       "    <tr>\n",
       "      <th id=\"T_daae5_level0_col0\" class=\"col_heading level0 col0\" >Column Name</th>\n",
       "      <th id=\"T_daae5_level0_col1\" class=\"col_heading level0 col1\" >Type</th>\n",
       "      <th id=\"T_daae5_level0_col2\" class=\"col_heading level0 col2\" >Computed With</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <td id=\"T_daae5_row0_col0\" class=\"data row0 col0\" >pos</td>\n",
       "      <td id=\"T_daae5_row0_col1\" class=\"data row0 col1\" >int</td>\n",
       "      <td id=\"T_daae5_row0_col2\" class=\"data row0 col2\" ></td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td id=\"T_daae5_row1_col0\" class=\"data row1 col0\" >text</td>\n",
       "      <td id=\"T_daae5_row1_col1\" class=\"data row1 col1\" >string</td>\n",
       "      <td id=\"T_daae5_row1_col2\" class=\"data row1 col2\" ></td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td id=\"T_daae5_row2_col0\" class=\"data row2 col0\" >title</td>\n",
       "      <td id=\"T_daae5_row2_col1\" class=\"data row2 col1\" >string</td>\n",
       "      <td id=\"T_daae5_row2_col2\" class=\"data row2 col2\" ></td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td id=\"T_daae5_row3_col0\" class=\"data row3 col0\" >heading</td>\n",
       "      <td id=\"T_daae5_row3_col1\" class=\"data row3 col1\" >json</td>\n",
       "      <td id=\"T_daae5_row3_col2\" class=\"data row3 col2\" ></td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td id=\"T_daae5_row4_col0\" class=\"data row4 col0\" >sourceline</td>\n",
       "      <td id=\"T_daae5_row4_col1\" class=\"data row4 col1\" >int</td>\n",
       "      <td id=\"T_daae5_row4_col2\" class=\"data row4 col2\" ></td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td id=\"T_daae5_row5_col0\" class=\"data row5 col0\" >minilm_embed</td>\n",
       "      <td id=\"T_daae5_row5_col1\" class=\"data row5 col1\" >array((384,), dtype=FLOAT)</td>\n",
       "      <td id=\"T_daae5_row5_col2\" class=\"data row5 col2\" >sentence_transformer(text, model_id='paraphrase-MiniLM-L6-v2')</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td id=\"T_daae5_row6_col0\" class=\"data row6 col0\" >clip_embed</td>\n",
       "      <td id=\"T_daae5_row6_col1\" class=\"data row6 col1\" >array((512,), dtype=FLOAT)</td>\n",
       "      <td id=\"T_daae5_row6_col2\" class=\"data row6 col2\" >clip_text(text, model_id='openai/clip-vit-base-patch32')</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td id=\"T_daae5_row7_col0\" class=\"data row7 col0\" >source_doc</td>\n",
       "      <td id=\"T_daae5_row7_col1\" class=\"data row7 col1\" >document</td>\n",
       "      <td id=\"T_daae5_row7_col2\" class=\"data row7 col2\" ></td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n"
      ],
      "text/plain": [
       "view 'chunks'\n",
       "\n",
       " Column Name                       Type                                                  Computed With\n",
       "         pos                        int                                                               \n",
       "        text                     string                                                               \n",
       "       title                     string                                                               \n",
       "     heading                       json                                                               \n",
       "  sourceline                        int                                                               \n",
       "minilm_embed array((384,), dtype=FLOAT) sentence_transformer(text, model_id='paraphrase-MiniLM-L6-v2')\n",
       "  clip_embed array((512,), dtype=FLOAT)       clip_text(text, model_id='openai/clip-vit-base-patch32')\n",
       "  source_doc                   document                                                               "
      ]
     },
     "execution_count": 20,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "chunks"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 21,
   "id": "695af1d1",
   "metadata": {
    "scrolled": true
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th>text</th>\n",
       "      <th>heading</th>\n",
       "      <th>minilm_embed</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <td>Marc Chagall - Wikipedia Jump to content Search Search</td>\n",
       "      <td>{}</td>\n",
       "      <td>[-0.262 -0.119 -0.133  0.048  0.12  -0.006 ... -0.556  0.372  0.468 -0.234 -0.226  0.164]</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>Marc Chagall 81 languages Afrikaans Alemannisch العربية Aragonés Արեւմտահայերէն Asturianu Azərbaycanca বাংলা Башҡортса Беларуская Беларуская (тарашкевіца) Български Català Čeština Cymraeg Dansk Deutsch Eesti Ελληνικά Español Esperanto Euskara فارسی Français Galego 한국어 Հայերեն हिन्दी Hrvatski Ido Bahasa Indonesia Interlingua Italiano עברית Jawa ქართული Kiswahili Latina Latviešu Lëtzebuergesch Lietuvių Magyar Македонски Malagasy مصرى Nederlands Nedersaksies 日本語 Norsk bokmål Norsk nynorsk Occitan Oʻzbekcha / ўзбекча پنجابی Picard Piemontèis Plattdüütsch Polski Português Română Runa Simi Русский Scots Shqip Sicilianu Simple English Slovenčina Slovenščina کوردی Српски / srpski Srpskohrvatski / српскохрватски Suomi Svenska ไทย Türkçe Українська Tiếng Việt Winaray 吴语 ייִדיש 粵語 中文 Edit links From Wikipedia, the free encyclopedia Russian-French artist (1887–1985) &quot;Chagall&quot; redirects here. For other uses, see Chagall (disambiguation) .</td>\n",
       "      <td>{&quot;1&quot;: &quot;Marc Chagall&quot;}</td>\n",
       "      <td>[-0.136  0.401 -0.53  -0.181 -0.453 -0.125 ... -0.184  0.122  0.644 -0.54   0.188  0.203]</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>Marc Chagall Chagall, c. 1920 Born Moishe Shagal ( 1887-07-06 ) 6 July 1887 (N.S.) Liozna , Vitebsk Governorate , Russian Empire (now Belarus) [1] Died 28 March 1985 (1985-03-28) (aged 97) Saint-Paul-de-Vence , France Nationality Russian Empire, later French [2] Known for Painting stained glass Notable work See list of artworks by Marc Chagall Movement Cubism Expressionism School of Paris Spouses Bella Rosenfeld ​ ​ ( m. 1915; died 1944) ​ Valentina (Vava) Brodsky ​ ​ ( m. 1952) ​ [3] Children 2 [4]</td>\n",
       "      <td>{&quot;1&quot;: &quot;Marc Chagall&quot;}</td>\n",
       "      <td>[-0.005  0.34  -0.315  0.17  -0.124  0.384 ... -0.144 -0.131  0.104 -0.412 -0.195 -0.058]</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>Marc Chagall [a] (born Moishe Shagal ; 6 July [ O.S. 24 June] 1887 – 28 March 1985) was a Belarusian-French artist. [b] An early modernist , he was associated with the École de Paris as well as several major artistic styles and created works in a wide range of artistic formats, including painting, drawings, book illustrations, stained glass , stage sets, ceramics, tapestries and fine art prints.</td>\n",
       "      <td>{&quot;1&quot;: &quot;Marc Chagall&quot;}</td>\n",
       "      <td>[ 0.053  0.138 -0.219  0.192 -0.1    0.234 ... -0.138 -0.294  0.306 -0.012 -0.059  0.007]</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>Chagall was born in 1887, into a Jewish family near Vitebsk , today in Belarus , but at that time in the Pale of Settlement of the Russian Empire. Before World War I , he travelled between Saint Petersburg , Paris , and Berlin . During that period, he created his own mixture and style of modern art, based on his ideas of Eastern European and Jewish folklore. He spent the wartime years in his native Belarus, becoming one of the country&#x27;s most distinguished artists and a member of the modernist avant-garde , founding the Vitebsk Arts College . He later worked in and near Moscow in difficult conditions during hard times in Russia following the Bolshevik Revolution , before leaving again for Paris in 1923. During World War II , he escaped occupied France to the United States, where he lived in New York City for seven years before returning to France in 1948.</td>\n",
       "      <td>{&quot;1&quot;: &quot;Marc Chagall&quot;}</td>\n",
       "      <td>[ 0.013  0.248 -0.692  0.143 -0.379  0.254 ... -0.232 -0.157 -0.018 -0.225 -0.208 -0.095]</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>Art critic Robert Hughes referred to Chagall as &quot;the quintessential Jewish artist of the twentieth century&quot;. According to art historian Michael J. Lewis, Chagall was considered to be &quot;the last survivor of the first generation of European modernists&quot;. For decades, he &quot;had also been respected as the world&#x27;s pre-eminent Jewish artist&quot;. [15] Using the medium of stained glass, he produced windows for the cathedrals of Reims and Metz as well as the Fraumünster in Zürich , windows for the UN and th ...... e experienced modernism&#x27;s &quot;golden age&quot; in Paris, where &quot;he synthesized the art forms of Cubism , Symbolism , and Fauvism , and the influence of Fauvism gave rise to Surrealism &quot;. Yet throughout these phases of his style &quot;he remained most emphatically a Jewish artist, whose work was one long dreamy reverie of life in his native village of Vitebsk.&quot; [16] &quot;When Matisse dies&quot;, Pablo Picasso remarked in the 1950s, &quot;Chagall will be the only painter left who understands what colour really is&quot;. [17]</td>\n",
       "      <td>{&quot;1&quot;: &quot;Marc Chagall&quot;}</td>\n",
       "      <td>[-0.172  0.348 -0.307  0.034 -0.071  0.111 ... -0.31  -0.011  0.302 -0.273 -0.163  0.152]</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>Early life and education [ edit ]</td>\n",
       "      <td>{&quot;1&quot;: &quot;Marc Chagall&quot;, &quot;2&quot;: &quot;Early life and education[edit]&quot;}</td>\n",
       "      <td>[-0.213  0.418  0.094  0.135 -0.069  0.265 ... -0.548  0.164  0.075  0.205  0.309  0.277]</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>Early life [ edit ] Marc Chagall&#x27;s childhood home in Vitebsk , Belarus. Currently site of the Marc Chagall Museum . Marc Chagall, 1912, The Spoonful of Milk (La Cuillerée de lait) , gouache on paper</td>\n",
       "      <td>{&quot;1&quot;: &quot;Marc Chagall&quot;, &quot;2&quot;: &quot;Early life and education[edit]&quot;, &quot;3&quot;: &quot;Early life[edit]&quot;}</td>\n",
       "      <td>[-0.04   0.143 -0.357  0.412 -0.331  0.201 ... -0.006 -0.057  0.255  0.181  0.018 -0.021]</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>Marc Chagall was born Moishe Shagal in 1887, into a Jewish family in Liozna , [1] near the city of Vitebsk , Belarus, then part of the Russian Empire . [c] [18] At the time of his birth, Vitebsk&#x27;s population was about 66,000. Half of the population was Jewish. [16] A picturesque city of churches and synagogues, it was called &quot;Russian Toledo &quot; by artist Ilya Repin , after the cosmopolitan city of the former Spanish Empire . [19] Because the city was built mostly of wood, little of it survived years of occupation and destruction during World War II.</td>\n",
       "      <td>{&quot;1&quot;: &quot;Marc Chagall&quot;, &quot;2&quot;: &quot;Early life and education[edit]&quot;, &quot;3&quot;: &quot;Early life[edit]&quot;}</td>\n",
       "      <td>[ 0.123  0.198 -0.496  0.154 -0.368  0.078 ... -0.057 -0.141 -0.063 -0.096 -0.136 -0.232]</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>Chagall was the eldest of nine children. The family name, Shagal, is a variant of the name Segal , which in a Jewish community was usually borne by a Levitic family. [20] His father, Khatskl (Zachar) Shagal, was employed by a herring merchant, and his mother, Feige-Ite, sold groceries from their home. His father worked hard, carrying heavy barrels, earning 20 roubles each month (the average wages across the Russian Empire was 13 roubles a month). Chagall wrote of those early years:</td>\n",
       "      <td>{&quot;1&quot;: &quot;Marc Chagall&quot;, &quot;2&quot;: &quot;Early life and education[edit]&quot;, &quot;3&quot;: &quot;Early life[edit]&quot;}</td>\n",
       "      <td>[-0.19   0.266 -0.4    0.129 -0.493  0.063 ... -0.194 -0.2    0.322  0.024 -0.068  0.031]</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>"
      ],
      "text/plain": [
       "                                                text  \\\n",
       "0  Marc Chagall - Wikipedia Jump to content Searc...   \n",
       "1  Marc Chagall 81 languages Afrikaans Alemannisc...   \n",
       "2  Marc Chagall Chagall, c. 1920 Born Moishe Shag...   \n",
       "3  Marc Chagall [a] (born Moishe Shagal ; 6 July ...   \n",
       "4  Chagall was born in 1887, into a Jewish family...   \n",
       "5  Art critic Robert Hughes referred to Chagall a...   \n",
       "6                  Early life and education [ edit ]   \n",
       "7  Early life [ edit ] Marc Chagall's childhood h...   \n",
       "8  Marc Chagall was born Moishe Shagal in 1887, i...   \n",
       "9  Chagall was the eldest of nine children. The f...   \n",
       "\n",
       "                                             heading  \\\n",
       "0                                                 {}   \n",
       "1                              {'1': 'Marc Chagall'}   \n",
       "2                              {'1': 'Marc Chagall'}   \n",
       "3                              {'1': 'Marc Chagall'}   \n",
       "4                              {'1': 'Marc Chagall'}   \n",
       "5                              {'1': 'Marc Chagall'}   \n",
       "6  {'1': 'Marc Chagall', '2': 'Early life and edu...   \n",
       "7  {'1': 'Marc Chagall', '2': 'Early life and edu...   \n",
       "8  {'1': 'Marc Chagall', '2': 'Early life and edu...   \n",
       "9  {'1': 'Marc Chagall', '2': 'Early life and edu...   \n",
       "\n",
       "                                        minilm_embed  \n",
       "0  [-0.2623971, -0.11875597, -0.1327094, 0.048251...  \n",
       "1  [-0.13631284, 0.40063256, -0.5300299, -0.18143...  \n",
       "2  [-0.0047689965, 0.33990884, -0.3152904, 0.1701...  \n",
       "3  [0.052763388, 0.13830872, -0.21864271, 0.19172...  \n",
       "4  [0.0128892455, 0.24784817, -0.69244295, 0.1426...  \n",
       "5  [-0.17184898, 0.34802842, -0.30670404, 0.03375...  \n",
       "6  [-0.21258691, 0.4176262, 0.09400387, 0.1349991...  \n",
       "7  [-0.04040356, 0.1428114, -0.3568075, 0.4118173...  \n",
       "8  [0.12289606, 0.19771104, -0.4960996, 0.1543681...  \n",
       "9  [-0.19016075, 0.26621026, -0.4000805, 0.129193...  "
      ]
     },
     "execution_count": 21,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "chunks.select(chunks.text, chunks.heading, chunks.minilm_embed).head()"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.9.19"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}
